Workshop Track -iclr 2017 Neural Combinatorial Optimization with Reinforcement Learning
نویسندگان
چکیده
We present a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent neural network that, given a set of city coordinates, predicts a distribution over different city permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent neural network using a policy gradient method. Without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes. These results, albeit still quite far from state-of-the-art, give insights into how neural networks can be used as a general tool for tackling combinatorial optimization problems.
منابع مشابه
Multi-task learning with deep model based reinforcement learning
In recent years, model-free methods that use deep learning have achieved great success in many different reinforcement learning environments. Most successful approaches focus on solving a single task, while multi-task reinforcement learning remains an open problem. In this paper, we present a model based approach to deep reinforcement learning which we use to solve different tasks simultaneousl...
متن کاملNeural Combinatorial Optimization with Reinforcement Learning
This paper presents a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent neural network that, given a set of city coordinates, predicts a distribution over different city permutations. Using negative tour length as the reward signal, we optimize the parameters of the rec...
متن کاملLearning Evaluation Functions
Evaluation functions are an essential component of practical search algorithms for optimization, planning and control. Examples of such algorithms include hillclimb-ing, simulated annealing, best-rst search, A*, and alpha-beta. In all of these, the evaluation functions are typically built manually by domain experts, and may require considerable tweaking to work well. I will investigate the thes...
متن کاملEnd-to-End Optimization of Task-Oriented Dialogue Model with Deep Reinforcement Learning
In this paper, we present a neural network based task-oriented dialogue system that can be optimized end-to-end with deep reinforcement learning (RL). The system is able to track dialogue state, interface with knowledge bases, and incorporate query results into agent’s responses to successfully complete task-oriented dialogues. dialogue policy learning is conducted with a hybrid supervised and ...
متن کاملSalient Object Subitizing: Supplementary Material 1. Visualizing the CNN Subitizing Classifiers
[1] A. Karpathy. t-SNE visualization of CNN. http://cs.stanford.edu/people/karpathy/ cnnembed/. [2] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS), 2012. [3] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. ...
متن کامل